learning kernel test
Review for NeurIPS paper: Learning Kernel Tests Without Data Splitting
Additional Feedback: Minor comments ******************* - l.55 the observed test statistic \hat{\tau} is never really defined, in particular w.r.t. This might be a bit confusing - I think Lemma 1 needs rephrasing. In particular, by definition the expectation of \tau is already that of h, so one does not need to assume anything here. And the variance of \tau is \sigma 2 / n. Since \tau is already defined in the previous paragraph, writing "Let \mu denote E[h] and \sigma 2 Var(h)" seems enough - missing \mid H_A in eq 1? - l. 137 the e_j vectors are not defined, although it remains understandable.
Learning Kernel Tests Without Data Splitting
Modern large-scale kernel-based tests such as maximum mean discrepancy (MMD) and kernelized Stein discrepancy (KSD) optimize kernel hyperparameters on a held-out sample via data splitting to obtain the most powerful test statistics. While data splitting results in a tractable null distribution, it suffers from a reduction in test power due to smaller test sample size. Inspired by the selective inference framework, we propose an approach that enables learning the hyperparameters and testing on the full sample without data splitting. Our approach can correctly calibrate the test in the presence of such dependency, and yield a test threshold in closed form. At the same significance level, our approach's test power is empirically larger than that of the data-splitting approach, regardless of its split proportion.